DSGEM: Dual scene graph enhancement module‐based visual question answering

نویسندگان

چکیده

Visual Question Answering (VQA) aims to appropriately answer a text question by understanding the image content. Attention-based VQA models mine implicit relationships between objects according feature similarity, which neglects explicit objects, for example, relative position. Most Scene Graph-based exploit positions or visual construct scene graph, while they suffer from semantic insufficiency of edge relations. Besides, graph modality is often ignored in these works. In this article, novel Dual Graph Enhancement Module (DSGEM) proposed that exploits relevant external knowledge simultaneously two interpretable structures and modalities, makes reasoning process more logical precise. Specifically, authors respectively build textual graphs with help commonsense syntactic structure, explicitly endows specific semantics each relation. Then, enhancement modules are propagate involved structural guide interaction (nodes). Finally, embed such existing introduce relation ability. Experimental results on both V2 OK-VQA datasets show DSGEM effective compatible various architectures.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dual Attention Network for Visual Question Answering

Visual Question Answering (VQA) is a popular research problem that involves inferring answers to natural language questions about a given visual scene. Recent neural network approaches to VQA use attention to select relevant image features based on the question. In this paper, we propose a novel Dual Attention Network (DAN) that not only attends to image features, but also to question features....

متن کامل

Dual Recurrent Attention Units for Visual Question Answering

We propose an architecture for VQA which utilizes recurrent layers to generate visual and textual attention. The memory characteristic of the proposed recurrent attention units offers a rich joint embedding of visual and textual features and enables the model to reason relations between several parts of the image and question. Our single model outperforms the first place winner on the VQA 1.0 d...

متن کامل

Investigating Embedded Question Reuse in Question Answering

The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...

متن کامل

Revisiting Visual Question Answering Baselines

Visual question answering (VQA) is an interesting learning setting for evaluating the abilities and shortcomings of current systems for image understanding. Many of the recently proposed VQA systems include attention or memory mechanisms designed to support “reasoning”. For multiple-choice VQA, nearly all of these systems train a multi-class classifier on image and question features to predict ...

متن کامل

iVQA: Inverse Visual Question Answering

In recent years, visual question answering (VQA) has become topical as a long-term goal to drive computer vision and multi-disciplinary AI research. The premise of VQA’s significance, is that both the image and textual question need to be well understood and mutually grounded in order to infer the correct answer. However, current VQA models perhaps ‘understand’ less than initially hoped, and in...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Iet Computer Vision

سال: 2023

ISSN: ['1751-9632', '1751-9640']

DOI: https://doi.org/10.1049/cvi2.12186